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Given multivariate time series, we study the problem of forming portfolios with maximum 
mean reversion while constraining the number of assets in these portfolios. We show that it 
can be formulated as a sparse canonical correlation analysis and study various algorithms to 
solve the corresponding sparse generalized eigenvalue problems. After discussing penalized 
parameter estimation procedures, we study the sparsity versus predictability tradeoff and the 
impact of predictability in various markets. 
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Mean reversion has received a significant amount of attention as a classic indicator of predictability 
in financial markets and is sometimes apparent, for example, in equity excess returns over long 
horizons. While mean reversion is easy to identify in univariate time series, isolating portfolios of 
assets exhibiting significant mean reversion is a much more complex problem. Classic solutions 
include cointegration or canonical correlation analysis, which will be discussed in what follows. 

One of the key shortcomings of these methods though is that the mean reverting portfolios they 
identify are dense, i.e. they include every asset in the time series analyzed. For arbitrageurs, this 
means that exploiting the corresponding statistical arbitrage opportunities involves considerable 
transaction costs. From an econometric point of view, this also impacts the interpretability of 
the resulting portfolio and the significance of the structural relationships it highlights. Finally, 
optimally mean reverting portfolios often behave like noise and sometimes vary well inside bid- 
ask spreads, hence do not form meaningful statistical arbitrage opportunities. 

Here, we would like to argue that seeking sparse portfolios instead, i.e. optimally mean re- 
verting portfolios with a few assets, solves many of these issues at once: fewer assets means less 
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transaction costs and more interpretable results. In practice, the tradeoff between mean reversion 
and sparsity is often very favorable. Furthermore, penalizing for sparsity also makes sparse port- 
folios vary in a wider price range, so the market inefficiencies they highlight are more significant. 

Remark that all statements we will make here on mean reversion apply symmetrically to mo- 
mentum. Finding mean reverting portfolios using canonical correlation analysis means minimizing 
predictability, while searching for portfolios with strong momentum can also be done using canon- 
ical correlation analysis, by maximizing predictability. The numerical procedures involved are 
identical. 

Mean reversion has of course received a considerable amount of attention in the literature, most 
authors, such as Fama & French (1988), Poterba & Summers (1988) among many others, using 
it to model and test for predictability in excess returns. Cointegration techniques (see Engle & 
Granger (1987), and Alexander (1999) for a survey of applications in finance) are often used to 
extract mean reverting portfolios from multivariate time series. Early methods relied on a mix 
of regression and Dickey & Fuller (1979) stationarity tests or Johansen (1988) type tests but it 
was subsequently discovered that an earlier canonical decomposition technique due to Box & Tiao 
(1977) could be used to extract cointegrated vectors by solving a generalized eigenvalue problem 
(see Bewley, Orden, Yang & Fisher (1994) for a more complete discussion). 

Several authors then focused on the optimal investment problem when excess returns are mean 
reverting, with Kim & Omberg (1996) and Campbell & Viceira (1999) or Wachter (2002) for ex- 
ample obtaining closed-form solutions in some particular cases. Liu & Longstaff (2004) also study 
the optimal investment problem in the presence of a "textbook" finite horizon arbitrage opportunity, 
modeled as a Brownian bridge, while Jurek & Yang (2006) study this same problem when the arbi- 
trage horizon is indeterminate. Gatev, Goetzmann & Rouwenhorst (2006) studied the performance 
of pairs trading, using pairs of assets as classic examples of structurally mean-reverting portfolios. 
Finally, the LTCM meltdown in 1998 focused a lot of attention on the impact of leverage limits 
and liquidity, see Grossman & Vila (1992) or Xiong (2001) for a discussion. 

Sparse estimation techniques in general and the i x penalization approach we use here in partic- 
ular have also received a lot of attention in various forms: variable selection using the LASSO (see 
Tibshirani (1996)), sparse signal representation using basis pursuit by Chen, Donoho & Saunders 
(2001), compressed sensing (see Donoho & Tanner (2005) and Candes & Tao (2005)) or covari- 
ance selection (see Banerjee, Ghaoui & d' Aspremont (2007)), to cite only a few examples. A recent 
stream of works on the asymptotic consistency of these procedures can be found in Meinshausen & 
Yu (2007), Candes & Tao (2007), Banerjee et al. (2007), Yuan & Lin (2007) or Rothman, Bickel, 
Levina & Zhu (2007) among others. 

In this paper, we seek to adapt these results to the problem of estimating sparse (i.e. small) 
mean reverting portfolios. Suppose that Su is the value at time t of an asset Si with % — 1, . . . , n 
and t = 1, ... ,m, we form portfolios P t of these assets with coefficients Xi, and assume they 
follow an Ornstein-Uhlenbeck process given by: 



where Z t is a standard Brownian motion. Our objective here is to maximize the mean reversion 
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coefficient A of P t by adjusting the portfolio weights Xj, under the constraints that ||x|| = 1 and 
that the cardinality of x, i.e. the number of nonzero coefficients in x, remains below a given k > 0. 

Our contribution here is twofold. First, we describe two algorithms for extracting sparse mean 
reverting portfolios from multivariate time series. One is based on a simple greedy search on 
the list of assets to include. The other uses semidefinite relaxation techniques to directly get good 
solutions. Both algorithms use predictability in the sense of Box & Tiao (1977) as a proxy for mean 
reversion in ([]])• Second, we show that penalized regression and covariance selection techniques 
can be used as preprocessing steps to simultaneously stabilize parameter estimation and highlight 
key dependence relationships in the data. We then study the sparsity versus mean reversion tradeoff 
in several markets, and examine the impact of portfolio predictability on market efficiency using 
classic convergence trading strategies. 

The paper is organized as follows. In Section [2l we briefly recall the canonical decomposition 
technique derived in Box & Tiao (1977). In Section [3j we adapt these results and produce two 
algorithms to extract small mean reverting portfolios from multivariate data sets. In Section HI 
we then show how penalized regression and covariance selection techniques can be used as pre- 
processing tools to both stabilize estimation and isolate key dependence relationships in the time 
series. Finally, we present some empirical results in Section [5] on U.S. swap rates and foreign 
exchange markets. 



2 Canonical decompositions 

We briefly recall below the canonical decomposition technique derived in Box & Tiao (1977). 
Here, we work in a discrete setting and assume that the asset prices follow a stationary vector 
autoregressive process with: 

St = S t - 1 A + Z t , (2) 

where S t -i is the lagged portfolio process, A E R nxn anc i z t is a vector of i.i.d. Gaussian noise 
with zero mean and covariance £ 6 S n , independent of St-%. Without loss of generality, we can 
assume that the assets St have zero mean. The canonical analysis in Box & Tiao (1977) starts as 
follows. For simplicity, let us first suppose that n — 1 in equation (O, to get: 

E[S!]^B[(S t ^A) 2 ]+B[Z^ 

which can be rewritten as of = o1_ x + E. Box & Tiao (1977) measure the predictability of 
stationary series by: 



The intuition behind this variance ratio is very simple: when it is small the variance of the noise 
dominates that of S t -i and S t is almost pure noise, when it is large however, S t -i dominates the 
noise and St is almost perfectly predictable. Throughout the paper, we will use this measure of 
predictability as a proxy for the mean reversion parameter A in (Q~|). Consider now a portfolio 
P t = S t x with weights x G R", using © we know that S t x = S t -\Ax + Z t x, and we can measure 
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its predicability as: 

x T A T TAx 



v(x) 



x T Tx ' 

where T is the co variance matrix of S t . Minimizing predictability is then equivalent to finding the 
minimum generalized eigenvalue A solving: 

det(Ar - A T YA) = 0. (4) 

Assuming that T is positive definite, the portfolio with minimum predictability will be given by 
x = T~ l / 2 z, where z is the eigenvector corresponding to the smallest eigenvalue of the matrix: 

Y- l ' 2 A T YAY- 1 / 2 . (5) 

We must now estimate the matrix A. Following Bewley et al. (1994), equation © can be written: 

St — St + Zt, 

where S t is the least squares estimate of St with S t = S t -\A and we get: 

A={Sl l S t - i y l Sl.St. (6) 

The Box & Tiao (1977) procedure then solves for the optimal portfolio by inserting this estimate 
in the generalized eigenvalue problem above. 

Box & Tiao procedure. Using the estimate © in © and the stationarity of S t , the Box & 
Tiao (1977) procedure finds linear combinations of the assets ranked in order of predictability by 
computing the eigenvectors of the matrix: 

(SjS t )- 1/2 (sJSt)(SjSt)- 1/2 (7) 

where S t is the least squares estimate computed above. Figure Q] gives an example of a Box & Tiao 
(1977) decomposition on U.S. swap rates and shows eight portfolios of swap rates with maturities 
ranging from one to thirty years, ranked according to predictability. Tabled] shows mean reversion 
coefficient, volatility and the p-value associated with the mean reversion coefficient. We see that 
all mean reversion coefficients are significant at the 99% level except for the last portfolio. For this 
highly mean reverting portfolio, a mean reversion coefficient of 238 implies a half-life of about 
one day, which explains the lack of significance on daily data. 

Bewley et al. (1994) show that the canonical decomposition above and the maximum likelihood 
decomposition in Johansen (1988) can both be formulated in this manner. We very briefly recall 
their result below. 
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Number of swaps: 


1 


2 


3 


4 


5 


6 


7 


8 


Mean reversion 


0.58 


8.61 


16.48 


38.59 


84.55 


174.82 


184.83 


238.11 


P-value 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.51 


Volatility 


0.21 


0.28 


0.34 


0.14 


0.10 


0.09 


0.07 


0.07 



Table 1 : Summary statistics for canonical U.S. swap portfolios: mean reversion coefficient, volatil- 
ity and the p-value associated with the mean reversion coefficient for portfolio sizes ranging from 
one to eight. 



Johansen procedure. Following Bewley et al. (1994), the maximum likelihood procedure for 
estimating cointegrating vectors derived in Johansen (1988) and Johansen (1991) can also be writ- 
ten as a canonical decomposition a la Box & Tiao (1977). Here however, the canonical analysis 
is performed on the first order differences of the series S t and their lagged values S t -i. We can 
rewrite equation © as: 

AS t = QS t -! + Z t 

where Q = A — I. The basis of (potentially) cointegrating portfolios is then found by solving the 
following generalized eigenvalue problem: 

A(Sf_A-i) - OS^AS^ASf AS^ASf £U) (8) 

in the variable A e R. 



3 Sparse decomposition algorithms 

In the previous section, we have seen that canonical decompositions can be written as generalized 
eigenvalue problems of the form: 

det(XB-A) = (9) 

in the variable A G R, where A,B G S" are symmetric matrices of dimension n. Full generalized 
eigenvalue decomposition problems are usually solved using a QZ decomposition. Here however, 
we are only interested in extremal generalized eigenvalues, which can be written in variational 
form as: 

A max (AjB)=max ^_^ 
x-GR" X T Bx 

In this section, we will seek to maximize this ratio while constraining the cardinality of the (port- 
folio) coefficient vector x and solve instead: 

maximize x T Ax/x T Bx 

subject to Card(x) < k (10) 

IMI = i> 

where k > is a given constant and Card(x) is the number of nonzero coefficients in x. This will 
compute a sparse portfolio with maximum predictability (or momentum), a similar problem can 
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Figure 1: Box-Tiao decomposition on 100 days of U.S. swap rate data (in percent). The eight 
canonical portfolios of swap rates with maturities ranging from one to thirty years are ranked in 
decreasing order of predictability. The mean reversion coefficient A is listed below each plot. 
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be formed to minimize it (and obtain a sparse portfolio with maximum mean reversion). This is 
a hard combinatorial problem, in fact, Natarajan (1995) shows that sparse generalized eigenvalue 
problems are equivalent to subset selection, which is NP-hard. We can't expect to get optimal 
solutions and we discuss below two efficient techniques to get good approximate solutions. 

3.1 Greedy search 

Let us call Ik the support of the solution vector x given k > in problem (flOl) : 

h = {i e [l,n] : Xi ^ 0}, 

by construction \Ik\ < k. We can build approximate solutions to (flOl) recursively in k. When 
k — 1, we simply find I\ as: 

h = aigmsLxAu/Bii. 

ie[l,n] 

Suppose now that we have a good approximate solution with support set Ik given by: 

x k = argmax , 

{x£R n :xjc=0}X ox 

where 1% is the complement of the set Ik- This can be solved as a generalized eigenvalue problem 
of size k. We seek to add one variable with index i k+ i to the set Ik to produce the largest increase 
in predictability by scanning each of the remaining indices in The index i k +\ is then given by: 

i k +i = argmax max ^ , where Ji = It\ {i}, 

i6 /c {x&R n :xj x =0} X 1 BX 

which amounts to solving [n — k) generalized eigenvalue problems of size k + 1. We then define: 

Ik+l = Ik U {^fc+l}, 

and repeat the procedure until k = n. Naturally, the optimal solutions of problem (flOl) might not 
have increasing support sets Ik C Ik+i, hence the solutions found by this recursive algorithm are 
potentially far from optimal. However, the cost of this method is relatively low: with each iteration 
costing 0(k 2 (n — k)), the complexity of computing solutions for all target cardinalities k is 0(n 4 ). 
This recursive procedure can also be repeated forward and backward to improve the quality of the 
solution. 

3.2 Semidefinite relaxation 

An alternative to greedy search which has proved very efficient on sparse maximum eigenvalue 
problems is to derive a convex relaxation of problem (flOl) . In this section, we extend the techniques 
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of d'Aspremont, El Ghaoui, Jordan & Lanckriet (2007) to formulate a semidefinite relaxation for 
sparse generalized eigenvalue problems in (TTOT ): 



maximize x T Ax/x T Bx 
subject to Card(x) < k 

\\x\\ = 1, 

with variable x G R™. As in d'Aspremont et al. (2007), we can form an equivalent program in 
terms of the matrix X = xx T G S n : 

maximize Tr(AX)/Tr(BX) 
subject to Card(X) < k 2 
Tr(X) = 1 

X y 0, Rank(X) = 1, 

in the variable X G S n . This program is equivalent to the first one: indeed, if X is a solution to 
the above problem, then X y and Rank(X) = 1 mean that we must have X = xx T , while 
Tr(X) = 1 implies that ||x|| = 1. Finally, if X = xx T then Card(X) < k 2 is equivalent to 
Card(x) < k. 

Now, because for any vector u G R n , Card(u) = q implies \\u\\x < y/q\\u\\2, we can replace 
the nonconvex constraint Card(X) < k 2 by a weaker but convex constraint 1 T |X|1 < k, using 
the fact that \\X\\p = Vx T x = 1 when X = xx T and Tr(X) = 1. We then drop the rank 
constraint to get the following relaxation of (fTOT ): 

maximize Tr(AX)/Tr(BX) 
subjectto l T \X\l<k 

Tr(X) = l ( } 

x y o, 

which is a quasi-convex program in the variable X G S n . After the following change of variables: 

Y X 



Tr{BXy Tr{BXy 
and rewrite (fTTT) as: 

maximize Tr (AY) 
subjectto l T \Y\l-kz<0 

Tr(Y)-z = (12) 

Tr(BY) = 1 

y y o, 

which is a semidefinite program (SDP) in the variables Y G S n and z G R + and can be solved using 
standard SDP solvers such as SEDUMI by Sturm (1999) and SDPT3 by Toh, Todd & Tutuncu 
(1999). The optimal value of problem ([12]) will be an upper bound on the optimal value of the 
original problem (fTOl) . If the solution matrix Y has rank one, then the relaxation is tight and both 
optimal values are equal. When Rank(F) > 1 at the optimum in (fT2~l) . we get an approximate 
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solution to (flOl) using the rescaled leading eigenvector of the optimal solution matrix Y in (fT2l) . 
The computational complexity of this relaxation is significantly higher than that of the greedy 
search algorithm in ^3.11 On the other hand, because it is not restricted to increasing sequences of 
sparse portfolios, the performance of the solutions produced is often higher too. Furthermore, the 
dual objective value produces an upper bound on suboptimality. Numerical comparisons of both 
techniques are detailed in Section [5J 

4 Parameter estimation 

The canonical decomposition procedures detailed in Section|2]all rely on simple estimates of both 
the co variance matrix T in © and the parameter matrix A in the vector autoregressive model ©. 
Of course, both estimates suffer from well-known stability issues and a classic remedy is to pe- 
nalize the covariance estimation using, for example, a multiple of the norm of T. In this section, 
we would like to argue that using an i± penalty term to stabilize the estimation, in a procedure 
known as covariance selection, simultaneously stabilizes the estimate and helps isolate key id- 
iosyncratic dependencies in the data. In particular, covariance selection clusters the input data in 
several smaller groups of highly dependent variables among which we can then search for mean 
reverting (or momentum) portfolios. Covariance selection can then be viewed as a preprocessing 
step for the sparse canonical decomposition techniques detailed in Section |3] Similarly, penalized 
regression techniques such as the LASSO by Tibshirani (1996) can be used to produce stable, 
structured estimates of the matrix parameter A in the VAR model ©• 

4.1 Covariance selection 

Here, we first seek to estimate the covariance matrix Y by maximum likelihood. Following Demp- 
ster (1972), we penalize the maximum- likelihood estimation to set a certain number of coefficients 
in the inverse covariance matrix to zero, in a procedure known as covariance selection. Zeroes 
in the inverse covariance matrix correspond to conditionally independent variables in the model 
and this approach can be used to simultaneously obtain a robust estimate of the covariance matrix 
while, perhaps more importantly, discovering structure in the underlying graphical model (see Lau- 
ritzen (1996) for a complete treatment). This tradeoff between log-likelihood of the solution and 
number of zeroes in its inverse (i.e. model structure) can be formalized in the following problem: 

max logdetX - Tr(SX) - pCard(X) (13) 

X 

in the variable X E S n , where £ G S n is the sample covariance matrix, Card(X) is the cardinality 
of X, i.e. the number of nonzero coefficients in X and p > is a parameter controlling the trade- 
off between likelihood and structure. 

Solving the penalized maximum likelihood estimation problem in (fT3l both improves the sta- 
bility of this estimation procedure by implicitly reducing the number of parameters and directly 
highlights structure in the underlying model. Unfortunately, the cardinality penalty makes this 
problem very hard to solve numerically. One solution developed in d'Aspremont, Banerjee & 
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Figure 2: Left: conditional dependence network inferred from the pattern of zeros in the inverse 
swap covariance matrix. Right: same plot, using this time the penalized covariance estimate with 
penalty p = .1 in the maximum likelihood estimation (PT4l) . 



El Ghaoui (2006), Banerjee et al. (2007) or Friedman, Hastie & Tibshirani (2007) is to relax the 
Card(X) penalty and replace it by the (convex) l\ norm of the coefficients of X to solve: 



in the variable X £ S n . The penalty term involving the sum of absolute values of the entries of 
X acts as a proxy for the cardinality: the function j=i l-^y I can be seen as the largest convex 
lower bound on Card(X) on the hypercube, an argument used by Fazel, Hindi & Boyd (2001) 
for rank minimization. It is also often used in regression and variable selection procedures, such 
as the LASSO by Tibshirani (1996). Other permutation invariant estimators have been detailed in 
Rothman et al. (2007) for example. 

In a Gaussian model, zeroes in the inverse covariance matrix point to variables that are condi- 
tionally independent, conditioned on all the remaining variables. This has a clear financial interpre- 
tation: the inverse covariance matrix reflects independence relationships between the idiosyncratic 
components of asset price dynamics. In Figure [2j we plot the resulting network of dependence, or 
graphical model for U.S. swap rates. In this graph, variables (nodes) are joined by a link if and 
only if they are conditionally dependent. We plot the graphical model inferred from the pattern of 
zeros in the inverse sample swap covariance matrix (left) and the same graph, using this time the 
penalized covariance estimate in ([141) with penalty parameter p — .1 (right). The graph layout was 
done using Cytoscape. Notice that in the penalized estimate, rates are clustered by maturity and 
the graph clearly reveals that swap rates are moving as a curve. 
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x 




(14) 
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4.2 Estimating structured VAR models 



In this section, using similar techniques, we show how to recover a sparse vector autoregressive 
model from multivariate data. 

Endogenous dependence models. Here, we assume that the conditional dependence structure of 
the assets S t is purely endogenous, i.e. that the noise terms in the vector autoregressive model © 
are i.i.d. with: 

S t = St-iA + Z t , 
where Z t ~ A/"(0, crl) for some a > 0. In this case, we must have: 

T = A T TA + crl 

since A T <g> A has no unit eigenvalue (by stationarity), this means that: 

r/a = (I-A T ^A T )- 1 I 
where A ® B is the Kronecker product of A and 5, which implies: 

A T A = l-aT-\ 

We can always choose a small enough so that I — oT _1 y 0. This means that we can directly get A 
as a matrix square root of (I — oT -1 ). Furthermore, if we pick A to be the Cholesky decomposition 
of (I — 0T _1 ), and if the graph of T is chordal (i.e. has no cycles of length greater than three) then 
there is a permutation of the variables P such that the Cholesky decomposition of PTP T , and the 
upper triangle of PTP T have the same pattern of zeroes (see Wermuth (1980) for example). In 
Figure SI we plot two dependence networks, one chordal (on the left), one not (on the right). In 
this case, the structure (pattern of zeroes) of A in the VAR model © can be directly inferred from 
that of the penalized covariance estimate. 

Gilbert (1994, §2.4) also shows that if A satisfies A T A = I — oT _1 then, barring numerical 
cancellations in A T A, the graph of is the intersection graph of A so: 

(r _1 )y = =>• A kl A kj = 0, for all A; = 1, ... ,n. 

This means in particular that when the graph of T is disconnected, then the graph of A must also 
be disconnected along the same clusters of variables, i.e. A and T have identical block-diagonal 
structure. In §4.31 we will use this fact to show that when the graph of T is disconnected, optimally 
mean reverting portfolios must be formed exclusively of assets within a single cluster of this graph. 

Exogenous dependence models. In the general case where the noise terms are correlated, with 
Zt ~ A/"(0, E) for a certain noise covariance S, and the dependence structure is partly exogenous, 
we need to estimate the parameter matrix A directly from the data. In Section [2l we estimated the 
matrix A in the vector autoregressive model © by regressing S t on S t -i'. 
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Figure 3: Left: a chordal graphical model: no cycles of length greater than three. Right: a non- 
chordal graphical model. 



Here too, we can modify this estimation procedure in order to get a sparse model matrix A. Our 
aim is again to both stabilize the estimation and highlight key dependence relationships between 
St and S t -i- We replace the simple least-squares estimate above by a penalized one. We get the 
columns of A by solving: 

a, = argmin \\S it - S t ~ix\\ 2 + 7||x||i (15) 

X 

in the variable x £ R n , where the parameter A > controls sparsity. This is known as the LASSO 
(see Tibshirani (1996)) and produces sparse least squares estimates. 

4.3 Canonical decomposition with penalized estimation 

We showed that covariance selection highlights networks of dependence among assets, and that pe- 
nalized regression could be used to estimate sparse model matrices A. We now show under which 
conditions these results can be combined to extract information on the support of the canonical 
portfolios produced by the decompositions in Section |2] from the graph structure of the covariance 
matrix T and of the model matrix A. Because both covariance selection and the lasso are substan- 
tially cheaper numerically than the sparse decomposition techniques in Section |3l our goal here is 
to use these penalized estimation techniques as preprocessing tools to narrow down the range of 
assets over which we look for mean reversion. 

In Section |2l we saw that the Box & Tiao (1977) decomposition for example, could be formed 
by solving the following generalized eigenvalue problem: 

det(Ar - A T TA) = 0, 
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Figure 4: Left: a connected graphical model. Right: disconnected models. 



where Y is the covariance matrix of the assets S t and A is the model matrix in ©. Suppose now 
that our penalized estimates of the matrices T and A T TA have disconnected graphs with identical 
clusters, i.e. have the same block diagonal structure, Gilbert (1994, Th. 6.1) shows that the support 
of the generalized eigenvectors of the pair {T, A T TA} must be fully included in one of the clusters 
of the graph of the inverse covariance r -1 . In other words, if the graph of the penalized estimate of 
r _1 and A are disconnected along the same clusters, then optimally unpredictable (or predictable) 
portfolios must be formed exclusively of assets in a single cluster. 

This suggests a simple procedure for finding small mean reverting portfolios in very large 
data sets. We first estimate a sparse inverse covariance matrix by solving the covariance selection 
problem in (fT4"l) . setting p large enough so that the graph of is split into sufficiently small 
clusters. We then check if either the graph is chordal or if penalized estimates of A share some 
clusters with the graph of After this preprocessing step, we use the algorithms of Section |3] 
to search these (much smaller) clusters of variables for optimal mean reverting (or momentum) 
portfolios. 

5 Empirical results 

In this section, we first compare the performance of the algorithms described in Section[3] We then 
study the mean reversion versus sparsity tradeoff on various financial instruments. Finally, we test 
the performance of convergence trading strategies on sparse mean reverting portfolios. 
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5.1 Numerical performance 



In FigureQ]we plotted the result of the Box-Tiao decomposition on U.S. swap rate data (see details 
below). Each portfolio is a dense linear combination of swap rates, ranked in decreasing order of 
predictability. In Figure [6l we apply the greedy search algorithm detailed in Section [3] to the same 
data set and plot the sparse portfolio processes for each target number of assets. Each subplot of 
Figure [6] lists the number k of nonzero coefficients of the corresponding portfolio and its mean 
reversion coefficient A. Figure [5] then compares the performance of the greedy search algorithm 
versus the semidefinite relaxation derived in Section [3j On the left, for each algorithm, we plot 
the mean reversion coefficient A versus portfolio cardinality (number of nonzero coefficients). We 
observe on this example that while the semidefinite relaxation does produce better results in some 
instances, the greedy search is more reliable. Of course, both algorithms recover the same solutions 
when the target cardinality is set to k = 1 or k = n. On the right, we plot CPU time (in seconds) 
as a function of the total number of assets to search. As a quick benchmark, producing 100 sparse 
mean reverting portfolios for each target cardinality between 1 and 100 took one minute and forty 
seconds. 




Cardinality Total number of assets 



Figure 5: Left: Mean reversion coefficient A versus portfolio cardinality (number of nonzero co- 
efficients) using the greedy search (circles, solid line) and the semidefinite relaxation (squares) 
algorithms on U.S. swap rate data. Right: CPU time (in seconds) versus total number of assets n 
to compute a full set of sparse portfolios (with cardinality ranging from 1 to n) using the greedy 
search algorithm. 



5.2 Mean reversion versus sparsity 

In this section, we study the mean reversion versus sparsity tradeoff on several data sets. We also 
test the persistence of this mean reversion out of sample. 
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Figure 6: Sparse canonical decomposition on 100 days of U.S. swap rate data (in percent). The 
number of nonzero coefficients in each portfolio vector is listed as k on top of each subplot, while 
the mean reversion coefficient A is listed below each one. 
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Swap rates. In Figure[7Jwe compare in and out of sample estimates of the mean reversion versus 
cardinality tradeoff. We study U.S. swap rate data for maturities 1Y, 2Y, 3Y, 4Y, 5Y, 7Y, 10Y and 
30Y from 1998 until 2005. We first use the greedy algorithm of Section [3] to compute optimally 
mean reverting portfolios of increasing cardinality for time windows of 200 days and repeat the 
procedure every 50 days. We plot average mean reversion versus cardinality in Figure [TJ on the 
left. We then repeat the procedure, this time computing the (out of sample) mean reversion in the 
200 days time window immediately following our sample and also plot average mean reversion 
versus cardinality. In Figure [7J on the right, we plot the out of sample portfolio price range (spread 
between min. and max. in basis points) versus cardinality (number of nonzero coefficients) on the 
same U.S. swap rate data. Table [2] shows the portfolio composition for each target cardinality. 
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-0.031 


0.025 


-0.130 
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-0.007 


0.016 


-0.008 


0.014 



Table 2: Composition of optimal swap portfolios for various target cardinalities. 



Foreign exchange rates. We study the following U.S. dollar exchange rates: Argentina, Aus- 
tralia, Brazil, Canada, Chile, China, Colombia, Czech Republic, Egypt, Eurozone, Finland, Hong 
Kong, Hungary, India, Indonesia, Israel, Japan, Jordan, Kuwait, Latvia, Lithuania, Malaysia, Mex- 
ico, Morocco, New Zealand, Norway, Pakistan, Papua NG, Peru, Philippines, Poland, Romania, 
Russia, Saudi Arabia, Singapore, South Africa, South Korea, Sri Lanka, Switzerland, Taiwan, 
Thailand, Turkey, United Kingdom, Venezuela, from April 2002 until April 2007. Note that ex- 
change rates are quoted with four digits of accuracy (pip size), with bid-ask spreads around 0.0005 
for key rates. 

After forming the sample covariance matrix £ of these rates, we solve the covariance selection 
problem in (fT4l) . This penalized maximum likelihood estimation problem isolates a cluster of 14 
rates and we plot the corresponding graph of conditional co variances in Figure [8l For these 14 
rates, we then study the impact of penalized estimation of the matrices T and A on out of sample 
mean reversion. In Figure |9l we plot out of sample mean reversion coefficient A versus portfolio 
cardinality, on 14 rates selected by covariance selection. The sparse canonical decomposition 
was performed on both unpenalized estimates and penalized ones. The covariance matrix was 
estimated by solving the covariance selection problem (PT41) with p = 0.01 and the matrix A in © 
was estimated by solving problem (fT5l) with the penalty 7 set to zero out 20% of the regression 
coefficients. 
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We notice in Figure[9]that penalization has a double impact. First, the fact that sparse portfolios 
have a higher out of sample mean reversion than dense ones means that penalizing for sparsity 
helps prediction. Second, penalized estimates of T and A also produce higher out of sample mean 
reversion than unpenalized ones. In Figure [9] on the right, we plot portfolio price range versus 
cardinality and notice that here too sparse portfolios have a significantly broader range of variation 
than dense ones. 

5.3 Convergence trading 

Here, we measure the performance of the convergence trading strategies detailed in the appendix. 
In Figure [10] we plot average out of sample sharpe ratio versus portfolio cardinality on a 50 days 
(out of sample) time window immediately following the 100 days over which we estimate the 
process parameters. Somewhat predictably in the very liquid U.S. swap markets, we notice that 
while out of sample Sharpe ratios look very promising in frictionless markets, even minuscule 
transaction costs (a bid-ask spread of lbp) are sufficient to completely neutralize these market 
inefficiencies. 

6 Conclusion 

We have derived two simple algorithms for extracting sparse (i.e. small) mean reverting portfo- 
lios from multivariate time series by solving a penalized version of the canonical decomposition 
technique in Box & Tiao (1977). Empirical results suggest that these small portfolios present a 
double advantage over their original dense counterparts: sparsity means lower transaction costs 
and better interpretability, it also improved out-of-sample predictability in the markets studied in 
Section [5J Several important issues remain open at this point. First, it would be important to show 
consistency of the variable selection procedure: assuming we know a priori that only a few vari- 
ables have economic significance (i.e. should appear in the optimal portfolio), can we prove that 
the sparse canonical decomposition will recover them? Very recent consistency results by Amini 
& Wainwright (2008) on the sparse principal component analysis relaxation in d'Aspremont et al. 
(2007) seem to suggest that this is likely, at least for simple models. Second, while the dual of the 
semidefinite relaxation in ([111) provides a bound on suboptimality, we currently have no procedure 
for deriving simple bounds of this type for the greedy algorithm in Section [3T1 
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Appendix 



In the previous sections, we showed how to extract small mean reverting (or momentum) portfolios 
from multivariate asset time series. In this section we assume that we have identified such a mean 
reverting portfolio and model its dynamics given by: 

dP t = A(P - P t )dt + adZ t , (16) 

In this section, we detail how to optimally trade these portfolios under various assumptions re- 
garding market friction and risk- management constraints. We begin by quickly recalling results on 
estimating the Ornstein-Uhlenbeck dynamics in (fT6l) . 



Estimating Ornstein-Uhlenbeck processes 

By explicitly integrating the process P t in (fT6l) over a time increment At we get: 

Pt = P + e~ XAt (Pt-At -P) + <y f e x ^dZ s , (17) 

Jt-At 

which means that we can estimate A and a by simply regressing P t on P t -\ and a constant. With 



z^-^dZ* 



t-At 



1-e 



-2XAt 



2A 



we get the following estimators for the parameters of P t : 



fi 



1 N 

i=0 

— — log 
At 



a 



2A 



A? 



\ (l-e-2AA t)(iV _2)^ 



Y,{{Pt-p)-e-^{P t -il)f 



where At is the time interval between times t and t — 1. The expression in (fTTT) also allows us to 
compute the half -life of a market shock on P t as: 

log 2 

r=-f-, (18) 
which is a more intuitive measure of the magnitude of the portfolio's mean reversion. 
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Utility maximization in frictionless markets 

Suppose now that an agent invests in an asset P t and in a riskless bond B t following: 

dB t = rB t dt, 

the wealth W t of this agent will follow: 

dW t = N t dP t + (W t - N t P t )rdt. 
If P t follows a mean reverting process given by (fT6l) , this is also: 

dW t = (r(W t - N t P t ) + A(P - P t )N t )dt + N t adZ t . 
If we write the value function: 

V(W t ,P t ,t) = maxEi [ e -^ T -^U(W t )] , 

the H.J.B. equation for this problem can be written: 

dV - dV - dV 

PV = max—\(P t -P t ) + —(r(W t -N t P t ) + \(P-P t )N t ) + — 



N t OP K ' dW K K ' K ' ' dt 

1 d 2 V 2 1 d 2 V Ar 2 1 d 2 V Ar2 2 
+ 2dP^ a + 2dPdW Nta + 2dW^ a 

Maximizing in N t yields the following expression for the number of shares in the optimal portfolio: 

dV/dW /w - „. d 2 V/dPdW 

Jurek & Yang (2006) solve this equation explicitly for U(x) = logx and U(x) = x 1 ^ 1 /(l — 7) 
and we recover in particular the classic expression: 

N, = ( X(P - P t rP ' ) W„ 

in the log-utility case. 

Leverage constraints 

Suppose now that the portfolio is subject to fund withdrawals so that the total wealth evolves 
according to: 

dW = dU + dF 

where dli = N t dP t + (W t — N t P t )rdt and dF represents fund flows, with: 

dF = fdU + a f dZ[ 2) 
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where is a Brownian motion (independent of Z t ). Jurek & Yang (2006) show that the optimal 
portfolio allocation can also be computed explicitly in the presence of fund flows, with: 

in the log-utility case. Note that the constant / can also be interpreted in terms of leverage limits. 
In steady state, we have: 

which means that the leverage L t itself is normally distributed. If we assume for simplicity that 
P — 0, given the fund flow parameter /, the leverage will remain below the level M given by: 

M = " (A + r ^- (20) 

with confidence level N(a), where N(x) is the Gaussian CDF. The bound on leverage M can thus 
be seen as an alternate way of identifying or specifying the fund flow constant / in order to manage 
capital outflow risks. 
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Cardinality Cardinality 

Figure 7: Left: mean reversion coefficient A versus portfolio cardinality (number of nonzero coef- 
ficients), in sample (blue circles) and out of sample (black squares) on U.S. swaps. Right: out of 
sample portfolio price range (in basis points) versus cardinality (number of nonzero coefficients) 
on U.S. swap rate data. The dashed lines are at plus and minus one standard deviation. 




Figure 8: Graph of conditional covariance among a cluster of U.S. dollar exchange rates. Posi- 
tive dependencies are plotted as green links, negative ones in red, while the thickness reflects the 
magnitude of the covariance. 
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Figure 9: Left: out of sample mean reversion coefficient versus portfolio cardinality (number of 
nonzero coefficients), on 14 U.S. dollar exchange rates clustered by covariance selection. The 
sparse canonical decomposition was performed on both unpenalized estimates (black squares) and 
penalized ones (blue circles). Right: out of sample portfolio price range (in percent) versus cardi- 
nality. The dashed lines are at plus and minus one standard deviation. 
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Figure 10: Left: average out of sample sharpe ratio versus portfolio cardinality on U.S. swaps. 
Right: idem, with a bid-ask spread of lbp. The dashed lines are at plus and minus one standard 
deviation. 
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